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Abstract — Control-based approaches to grasp synthesis create 
grasping behavior by sequencing and combining control primi- 
tives. In the absence of any other structure, these approaches 
must evaluate a large number of feasible control sequences 
as a function of object shape, object pose, and task. This 
work explores a new approach to grasp synthesis that limits 
consideration to variations on a generalized localize-reach-grasp 
control policy. A new learning algorithm, known as schema 
structured learning , is used to learn which instantiations of the 
generalized policy are most likely to lead to a successful grasp in 
different problem contexts. Two experiments are described where 
Dexter, a bimanual upper torso, learns to select an appropriate 
grasp strategy as a function of object eccentricity and orientation. 
In addition, it is shown that grasp skills learned in this way 
can generalize to new objects. Results are presented showing 
that after learning how to grasp a small, representative set 
of objects, the robot’s performance quantitatively improves for 
similar objects that it has not experienced before. 

I. Introduction 

In the control-based approach to grasp synthesis, complex 
grasping behavior is represented in terms of simpler reaching 
and grasping primitives. For example, in order to pick up an 
object, a robot can execute a localize controller, followed by 
a reach controller and a grasp controller. Reach primitives 
move the manipulator to a reference pose derived from visual 
information or prior knowledge regarding the target object. 
Grasp primitives (i.e. grasp controllers) displace manipulator 
contacts based on tactile feedback so as to optimize the 
grasp [1], [2]. However, in order to be successful, the robot 
must select and parameterize reach and grasp controllers as 
a function of grasp context. This work explores an approach 
to autonomously learn a generalizable mapping from object 
shape and pose to controller parameterizations that results in 
a successful grasp. 

One of the distinctive features of this approach is that the ro- 
bot simultaneously learns a qualitative grasp strategy alongside 
a quantitative manipulator pose relative to the object. Many 
previous approaches to grasp learning in the literature have 
focused on one or the other of these problems. Cutkosky and 
Howe proposed an expert system (Grasp-Exp) that selected 
a robot grasp based on object and task attributes [3]. Based 
on observations regarding how manufacturing workers used 
different tools, the system was able to use tool and task 
criteria to select a grasp strategy from a predefined menu. In 
an attempt to develop an association that generalizes to new 
objects, Iberall used a neural network to learn the association 


between object and grasp type and an appropriate selection of 
palmer or (finger) pad opposition [4]. In the above approaches, 
the robot learns a qualitative grasp strategy that is not grounded 
in robust closed-loop controllers. In contrast, the qualitative 
choices that the robot learns to make in this work are actually 
parameterizations that directly specify how the reach or grasp 
controllers should behave. 

Other researchers considered the problem of learning the 
correct manipulator or contact pose as a function of ob- 
ject characteristics. Moussa proposed an approach where the 
system learns an object-centric homogeneous transform that 
correctly positions the gripper based on trial-and-error experi- 
ence [5]. Kamon, Flash, and Edelman described experiments 
where a parallel jaw gripper learns the relationship between 
features derived from a two-dimensional visual object outline 
and desired grasp points [6]. Both of these approaches to grasp 
learning assume that only one type of reach and one type of 
grasp will occur. In contrast, this paper’s approach allows the 
robot to learn to select between multiple qualitatively different 
reaches and grasps. 

Learning the mapping from object shape and pose to reach 
and grasp strategies can also be viewed as an instance of 
affordance learning. Gibson defined an affordance to be some 
aspect of the environment that an agent is able to make use 
of [7]. In the case of grasping, an object affords a grasp if 
the agent is able to pick it up. The different ways in which 
an object can be grasped are the object’s grasp affordances. 
De Granville et al. showed that grasp affordances can be 
represented as parametric probability distributions learned 
from human grasp data [8]. Stoytchev proposed an approach 
to learning tool affordances autonomously whereby the robot 
discovers how different tools can be used to push an object 
around on a table [9]. In this work, the set of viable grasp 
strategies is represented by controller parameterizations and 
a non-parametric distribution over object-centric manipulator 
poses. 

This paper explores a new approach to learning a mapping 
from object characteristics to the set of reach and grasp 
controllers that are likely to result in a good grasp. A controller 
representation known as the control basis (an overview is given 
in Section II) is used that can represent qualitatively differ- 
ent types of reach and grasp controllers by parameterizing 
reach and grasp artificial potential functions differently [10]. 
This paper uses schema structured learning , a new machine 



learning algorithm (discussed in Section III) introduced in 
a previous paper to learn grasp strategy as a function of 
object parameters [11]. Whereas the previous paper focused 
on the schema structured learning algorithm and gave only 
brief grasp results, Section IV of the current paper focuses 
more on the application of schema structured learning to grasp 
affordance learning. This paper gives more detail regarding 
how the target object is visually characterized and what kind 
of controllers implement the reach-grasp behavior. Section V 
shows that the schema structured learning approach can learn 
to select a qualitatively different reach strategy based on object 
eccentricity and a quantitatively different reference pose as a 
function of object orientation. In addition, it is shown that the 
general representation of object and grasp strategy enables the 
robot to improve its grasp performance measurably on objects 
it has no experience with, extrapolating from a relatively small 
set of training objects. 

II. The Control Basis Approach 

When using a control-based approach to solve multi-step 
tasks, a framework is needed that allows controllers to be 
sequenced in an organized way. The control basis framework 
accomplishes this by organizing the set of viable controllers 
and providing a robust way of evaluating system state [12]. 

The control basis can systematically specify an arbitrary 
closed-loop controller by matching an artificial potential func- 
tion with a sensor transform and effector transform [12]. The 
potential function specifies controller objectives, the effector 
transform specifies what degrees of freedom the controller 
uses, and the sensor transform implements the controller 
feedback loop and specifies the controller reference. In the 
following, a controller will sometimes be identified by its ar- 
tificial potential. In these cases, the artificial potential is written 
in small caps. For example, consider a reach controller. The 
sensor transform specifies which part of the manipulator is to 
reach and where that part must reach to. The effector transform 
specifies what degrees of freedom are used to accomplish the 
task. 

In general, the control basis realizes a complete con- 
troller by selecting one potential function from a set 
<f> = one sensor transform from a set E = 

{or, <72, • • •}, and one effector transform from a set T = 
{t i , T 2 , . . .}. Given <f>, E, and T, the set of controllers that 
may be generated is II C <f> x E x Y. When specifying 
a fully-instantiated controller, the notation fi\ a T denotes the 
controller constructed by parameterizing potential function 0^ 
with sensor transform a and effector transform r. When the 
controller has a non-zero reference, x , the sensor will be 
written cr(x). 

The control basis framework allows composite controllers 
to be constructed that execute multiple constituent controllers 
concurrently. Each constituent controller is assigned a priority, 
and controllers with lower priority are executed in the null 
space of controllers with higher priority. Composite controllers 
are denoted, 00£<0 a \°, where 00 £ is said to execute “subject- 
to” (i.e., in the null space of) 0 a | ° T . 



Fig. 1. Projecting the abstract policy onto the underlying state-action space: 
Assume that the robot is in state « 2 - The state mapping, /, projects this to 
abstract state, s' 2 . The abstract policy specifies that abstract action a' 2 is to 
be taken next. This inverse action mapping, g _1 projects a' 2 back onto the 
set of feasible action instantiations. 

The control basis approach measures system state in terms 
of controller dynamics. At any point in time, the instanta- 
neous error and the instantaneous gradient of error can be 
evaluated. Although the more general system dynamics can 
be treated [13], the current work only considers controller 
convergence to establish system state. Controller error is cal- 
culated by evaluating the controller’s potential function 0 for a 
particular sensor transform cr. Let 1Z be the set of compatible 
potential functions and sensors: ^ C $ x E. System state is 
defined to be the elements of 1Z that are converged with low 
error: 

Sk = {(0i,crj) £ TZ\(j)i is converged for gj with low error.}. 

( 1 ) 

The set of all states that can be represented this way is the 
power set 2 n . For example, if the system is in state Sk Q TZ, 
then (0i, <7j) £ Sk when 0^ is converged for Gj. If ((f) i, Gj) 0 
Sk, then & is not converged for Gj. 

III. Schema Structured Learning 

Without any structure, the control-basis approach would 
require a robot to search through a large space of possible 
controller combinations and sequences in order to determine 
which ones are likely to generate the desired behavior. Because 
this can require extensive experience and long learning times, 
it is frequently useful to constrain the kinds of policies that 
the system is allowed to consider. This can be accomplished 
by carefully designing the state and action space such that 
the resulting learning problem is tractable. Schema structured 
learning is a way of implicitly constraining the state and action 
space by restricting consideration to variations of a generalized 
solution, represented by an action schema [11]. 

An action schema is a tuple, S = (S', A', 70, T'), where S' 
and A! are an abstract state and action space, i t' : S' —> A' 
is an abstract policy, and T' : S' x A' — > S' is an abstract 
transition function that encodes desired transition behavior. It 
is assumed that the robot operates in an underlying Markov 
state and action space, S and A, but that a mapping exists 



between the underlying and abstract state and action spaces. 
The abstract policy, i r', is a generalized solution, defined in 
the abstract space, that has many policy instantiations in the 
underlying space. These policy instantiations are defined in 
terms of state and action mappings, / : S — > S' and g : A —> 
A', that assign each underlying state and action to an abstract 
state and action. The set of policy instantiations is, 

Vs t e S, 7T (s t ) e g~ 1 (Tr'(f(s t ))), (2) 

where g~ 1 (a') = {a G A\g(a) = a'} is the inverse of g. This 
is illustrated in Figure 1. Suppose that the robot is in state 
82 G S. The state mapping, /(s 2 ) = s 2 , projects this state 
onto s' 2 G S'. From this abstract state, the abstract policy takes 
abstract action a 2 , tt'(s' 2 ) = a' 2 . Finally, the inverse action 
mapping, g~ x , projects this abstract action onto a set of action 
choices, g~ x {a 2 ) = {ai, . . . , a n }. 

The goal of schema structured learning is to discover the 
policy instantiation(s) that maximizes the probability of meet- 
ing the transition constraints encoded by T' . When executing 
underlying action a G A from state s t G S, the next state, 
s t +x G S must satisfy, 

s t+1 ef-\r(f(s t ),g(a))), ( 3 ) 

where = {s G S\f(s) = s'} is the inverse of /. As 

long as action a G A causes the robot to transition to one 
of these next states, the action is said to succeed. Otherwise, 
the action fails. If an entire sequence of actions in a policy 
instantiation succeeds, then the policy instantiation will be said 
to succeed. An optimal policy instantiation, 7r*, is one which 
maximizes the probability of success. Let P 7V (a\s t ) be the 
probability of a successful policy instantiation given that the 
system takes action a G A, starting in state s t G S , and follows 
policy instantiation 7 r after that. If II is defined to be the set 
of all possible policies, then 

P*(a\s t ) = maxP n (a\s t ) (4) 

7TGIT 

is the maximum probability of a successful trajectory taken 
over all possible policies. This allows the optimal policy to be 
calculated using 

7r*(st)=arg max P*(a\s t ), (5) 

aeB(s t ) 

where B(s t ) = g~ x (n' (/ (see Equation 2) is the set of 
actions that are consistent with the abstract transition function 
when the system is in state s t G S. 

Given an action schema and the appropriate mapping, 
schema structured learning discovers the optimal policy instan- 
tiation online through a trial- and-error process. The algorithm 
gains experience by repeatedly executing policy instantiations 
of the action schema. While the system initially executes 
random instantiations of the abstract policy, performance 
quickly improves. Through experience, the system develops 
better and better approximations of the probability that a 
given action will succeed from a given state. The algorithm 
uses dynamic programming to estimate the set of optimal 


policy instantiations. For algorithmic details regarding schema 
structured learning, see [11]. 

In addition to structuring the solution space, an important 
characteristic of schema structured learning is that it can be 
used when the underlying state and action space is large 
or real-valued. A sample-based approach can be used to 
approximate the probability distribution of transition success 
because the distribution is binomial instead of multinomial, 
i.e. the algorithm is estimating P(success|8t, a) instead of 
P(s t +i\s t , a). When the underlying state and action space is 
real-valued, the action schema can have a large or infinite 
number of policy instantiations. However, instead of maxi- 
mizing over a large or infinite set of actions in Equation 5, 
it is possible to evaluate only elements of a finite sample set. 
As the algorithm gains experience, its estimate of P*(a\s t ) 
improves and the algorithm can re- samples the action set so 
that it more densely represents actions likely to succeed. 

IV. The Localize-Reach-Grasp Action Schema 

The LOCALIZE-REACH-GRASP action schema is used to 
model grasping behavior. This action schema maps onto the 
controllers described in this section. 

A. Controllers 

1) Localize: The localize controller, visually 

characterizes the object to be grasped in terms of a small 
number of parameters. First, the object is segmented from the 
background in both image planes. Next, the three-dimensional 
Cartesian object location is determined by triangulating on the 
centroid of the “blob” in each image plane. Next, LOCALIZE 
calculates the eigenvalues and eigenvectors of the covariance 
matrix describing the blob in each image plane. Essentially, 
this step characterizes the object as an ellipsoid, as illustrated 
in Figure 2. By triangulating on one end of the object ellipsoid, 
LOCALIZE calculates the three-dimensional Cartesian position, 
length, orientation, and eccentricity of the object. 

2 ) Reach: Reach controllers are referenced with respect 

to the last object detected by the LOCALIZE controller. The 
reach controllers are based on two artificial potentials: reach- 
to-position, f pos , and reach-to-orientation, f rot . f pos can be 
parameterized only by sensor and effector transforms, cr p (y, x ) 
and r p (y ), respectively. These transforms are parameterized 
by a set of manipulator control points, y , and a control 
reference offset, x. The fully instantiated position control 
primitive, f pos \ a /fy X \ moves the y manipulator control points 
to a point along the object’s major axis, at a fraction of 
x between the middle and one end of the major axis. The 
reach-to-orientation artificial potential, </> rot , is parameterized 
by cr r (y,6) and r r (y). The fully instantiated rotation control 
primitive, f rot \^ r< > y ^\ orients the y manipulator control points 
to an offset of 0 from the object’s major axis. Each contact is 
associated with a line from the contact frame centroid through 
the contact itself, orients the manipulator so that 

the average angle between each contact’s line and the object 
major axis (for the y set of contacts) is 0. 




Fig. 2. The robot characterizes objects in terms of an ellipsoid fit to the segmented object, (a) and (b) illustrate the left and right camera views of a squirt 
bottle, (c) and (d) illustrate the corresponding segmented “blobs” and their ellipsoids. 


Recall that the control basis allows control primitives 
to be combined using the subject-to relation, <, to cre- 
ate composite controllers. The current work only allows 


two combinations: 


When 


i <7 P (y,x) 


| cr p (y,x) 
J P os \T p {y) 


and 


\°r{v,e) , ~i <r P (y,x) 

^rot \r r (y) ^ ^P 


^ P° s \r p (y) 


ppos\ r executes alone, the manipulator reaches 
toward a position without controlling for orientation. When 
4* rot lr^(y)^ ^ ^posl^^y^ executes, the manipulator reaches 
toward both the specified position and rotation offsets. 

3) Grasp: Grasp controllers displace contacts toward 
good grasp configurations using feedback control [1], [2]. This 
approach uses tactile feedback to calculate an error gradient 
and displace grasp contacts on the object surface without a 
geometric object model. After making light contact with the 
object using sensitive tactile load cells, the controller displaces 
contacts toward minima in the grasp error function using 
discrete probes [1] or a continuous sliding motion [14]. 

Grasp controllers descend an artificial potential, fi g , derived 
from wrench error, 


->T -> 

= P p, 


f = E 


Wj, 


( 6 ) 


l<i<n 


where Wi is the contact wrench applied by the i th contact, 
assuming no surface friction. Wi is calculated directly from 
tactile feedback by using the approach of Bicci, et al . , to 
estimate contact location [15]. The control law converges when 
the contacts have been displaced to locations where the net 
applied wrench is minimized. If the minimum corresponds to 
zero net wrench, then, in the presence of friction, such a grasp 
achieves wrench closure because it fulfills the conditions for 
non-marginal equilibrium. Non-marginal equilibrium requires 
the contact forces achieving net zero force lie strictly inside 
their corresponding friction cones and has been shown to be 
a sufficient condition for wrench closure [16]. 

Two GRASP controllers are used: fi g \° 9 (i 23 ) anc * ^lr S (i 2 ) • 

fi g Ir fl (i 23 ) uses ttxee physical contacts to synthesize a grasp, 

while (f)g combines two physical contacts (out of three) 

into a virtual finger [17] that is considered to apply a single 
force that opposes a third physical contact. 


B. Localize-reach-grasp 

The localize-reach-grasp action schema basically rep- 
resents the class of policies where a localize action is 
followed by a reach and a GRASP action. It is defined over a 


Artificial Potential 

Bit 

<\>l 

001 

cf) r p 

010 

(f)g 

100 


TABLE I 

Artificial potentials and their corresponding state bits. 
States are represented as bit strings where a bit is set to 1 

WHEN THE CORRESPONDING ARTIFICIAL POTENTIAL IS IN THE STATE. 


set of four abstract states, S' = {(000), (001), (Oil), (HI)}, 
and three abstract actions, A! = {fii,fi pos ,fi g }. The four ab- 
stract states represent the possible combinations of controller 
convergence by representing the artificial potentials, fii, fi pos , 
and 4> g as bits (see Table I). 

The underlying state and action space is defined as described 
in Section II. The state space is defined in terms of the artificial 
potentials, 

^ Irg = \_fili fiposi fig\ •> ( 7 ) 

and the set of sensor transforms, 

^ Irg = Wh ( 8 ) 

W P { y , x)\y Q {1,2, 3}, a; G [0,1]}, 
{< 7 r ( y , 0 )\y £ {1,2,3},# G [0,7t/2]}, 
W 9 ( y)\y c {1,2,3}}}, 


as described in Section IV-A. The set of underlying states, S is 
a subset of 2 n (see Equation 1), where 7 Z C irg x Yii rg is the 
set of compatible artificial potentials and sensory transforms. 

The above abstract and underlying state and action spaces 
suggest the following g and / function: ^((^,...,^ 1 )) = 
/z (tt 1 ) , where h((fii, = fi , and f(s) = {fi G 4>| 3cr G 

E s.t. (fi, a) G s}. g essentially looks only at fi\ and “strips” it 
of its sensor and effector transforms to leave only a potential 
function. In a similar way, / maps onto abstract states by 
“stripping” sensor transforms from the converged pairs of 
artificial potentials and sensor transforms in the underlying 
state. 

The localize-reach-grasp action schema also defines 
a policy and transition function over the abstract state and 
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Fig. 3. The localize-reach-grasp action schema. The circles with binary 
numbers in them represent abstract states. The arrows represent abstract 
actions and possible transitions. 


action space, as illustrated in Figure 3. The policy is 

7T,' (000) = fa (9) 


7T,' rs (001) 

^r fl (0H) 

and the transition function is 

T'(000, </>i) 
T'(001, (f>rp) 
X' / (Oil, <f>g) 



= 001 ( 10 ) 
= Oil 
= 111 . 
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Fig. 4. Conditioning on eccentricity: the four bars in this graph plot the 
maximum estimated probability of grasp success (for round and eccentric 
objects) when reaching to both a position and orientation, and when reaching 
to a position without specifying orientation. When attempting to grasp the 
round object, the algorithm learns that a reach that specifies either position 
and orientation or position alone will work. However, when attempting to 
grasp an eccentric object, the system learns that orientation is important. 


V. Experiments 

This paper characterizes the ability of schema structured 
learning to make appropriate grasp distinctions in two exper- 
iments that show that schema structured learning can learn 
to select different grasp strategies as a function of object 
eccentricity and orientation. This paper also shows that these 
contextual distinctions generalize from a small set of training 
objects to a much larger set of test objects. All experiments 
were performed using Dexter, the UMass bi-manual humanoid 
robot [18]. Dexter consists of a 4-degree-of-freedom (DOF) 
bisight head and two Barrett Technologies whole-arm manipu- 
lators (WAMs). Each Barrett WAM is equipped with a 3 -finger, 
4-DOF Barrett Hand. Mounted on the tip of each Barrett hand 
finger is a 6-axis force-torque sensor. 

A. Conditioning on object eccentricity, orientation, and length 

1) Eccentricity: Recall from Section IV that Dexter can 
choose to reach to a position without specifying orientation or 
it can specify both position and orientation. This experiment 
demonstrates that schema structured learning can discover 
when each of these two reach strategies is appropriate. In 
this experiment, Dexter alternately reached toward a vertically 
presented towel roll (10cm diameter and 20cm high) or a 
round ball (16.5cm diameter) 42 times. At the beginning 
of each grasp trial, the object was placed in approximately 
the same tabletop location. Then, schema structured learning 
executed (using the localize-reach-grasp action schema) 
until either the absorbing state was reached or an action failed. 
When either of these events occurred, the system was reset 
and a new trial was started. For this experiment, Dexter was 
limited to executing only three-fingered grasps. In addition, the 
algorithm only considered object eccentricity, ignoring size, 
orientation, and position. 


The results given in Figure 4 show the maximum probability 
of successfully grasping the round object (the ball) and the 
eccentric object (the vertical towel roll) using both reach 
types. The two bars labeled “Position and Orientation” are 
the maximum probabilities of a successful grasp when the 
REACH controller specifies both position and orientation. The 
two bars labeled “Position” are the maximum probability of a 
successful grasp given a REACH where only position offset was 
specified. Notice that for the eccentric object, a much higher 
probability of success can be achieved when both position 
and orientation offset are specified. In contrast, for the round 
object, it is possible to achieve high success rates using either 
type of REACH controller. 

2) Orientation: The second experiment, Dexter learned to 
condition its choice of reach controller position reference 
based on object orientation. This experiment used LOCALIZE- 
REACH-GRASP-HOLD-LIFT, an augmented version of the 
LOCALIZE-REACH-GRASP action schema. After reaching and 
grasping the object, LOCALIZE-REACH-GRASP-HOLD-LIFT 
also holds the object and lifts it. In this experiment, a cracker 
box (measuring 5x5x1 0cm with a mass of 280g) was alter- 
nately presented to Dexter horizontally and vertically. After 
lifting the box, the hold action was only considered to have 
succeeded if all manipulator contacts remained in contact and 
the object did not exert a large moment on the manipulator, 
i.e. the object was grasped near its center of mass (COM). 
After 60 trials, schema structured learning had learned 
to use different grasp strategies based on vertical elevation of 
the object. Figure 5(a) shows that when the box was presented 
vertically, the probability of success was maximized when the 
manipulator was oriented perpendicular to the object major 
axis. However, note that the position of the manipulator along 
the major axis did not matter. Figure 5(b) shows that when the 


Probability of Lift Success Probability of Lift Success 



Fig. 5. The results of learning to lift the cracker box when it is presented vertically, (a), versus when it is presented horizontally, (b). These contour plots 
illustrate the probability of a successful lift as a function of REACH position and orientation. Dexter learns that, in both poses, the manipulator should be 
oriented perpendicular to the object (in both plots, probability is maximized near the top of the graph.) However, Dexter also learns that if the object is 
presented horizontally, it must be grasped near its center of mass (in (b), the probability is maximized on the left of the graph.) If the horizontal object is 
grasped near one end, the object may twist out of the grasp and drop. 


box was presented horizontally, it was necessary to grasp the 
box near its center as well as to use the correct orientation. 
This difference is due to the effect of the object COM on the 
success of lifting. When the box is horizontal, the COM can 
exert a large moment that can cause the object to drop if it is 
grasped from from its center. However, this is not a problem 
when the box is presented vertically. 

B. Generalization to New Objects: Bagging Groceries 

Although the previous experiments show that SCHEMA 
structured learning can discover shape- and pose- 
appropriate reach-grasp strategies, it is not yet clear whether 
these grasp skills generalize to new objects. In this experiment, 
Dexter learns to grasp a set of five training objects by 
distinguishing them based on object length, eccentricity, and 
orientation (see Section IV-A.l). The grasping skills learned 
from the five objects was evaluated by attempting to grasp 
a much larger set of 19 test objects that Dexter had not 
previously experienced. 

The system was trained using the five objects shown in 
Figure 6. The butter cracker box (Figure 6(e)) was always 
presented horizontally. For each of the five training objects, 
SCHEMA STRUCTURED LEARNING learned to grasp and lift 
it over the course of approximately 60 trials. Dexter was 
constrained only to grasp with two virtual fingers, 4>g\°H- 

The LOCALIZE-REACH-GRASP-HOLD-LIFT skills learned 
in the context of the five training objects were tested on 
the 19 different test objects shown in Figure 7. For each 
test object, the localize-reach-grasp-hold-lift action 
schema was executed 16 times: eight times without using the 
experience acquired from the test objects and eight times with 
this experience. During the eight executions that tested perfor- 
mance without experience, SCHEMA STRUCTURED LEARNING 
essentially selected random instantiations of the action schema. 
During the eight executions that did use the training data, the 


algorithm effectively interpolated (in the space of the four 
visual features) the action schema instantiation from among 
the neighboring training objects. 

Figure 8 illustrates the results. In both graphs, the horizontal 
axis corresponds to the object number in Figure 7. Figure 8(a) 
shows the grasp error after REACH controller and before 
GRASP controller execution. A low grasp error indicates that 
the manipulator is close to a good grasp configuration. The 
dashed line in Figure 8(a) shows the mean initial grasp error 
for the eight LOCALIZE-REACH-GRASP-HOLD-LIFT trials that 
did not benefit from the skills learned on the training set. The 
solid line shows the mean initial error for the eight trials that 
did use the training data. Although it is not universally true, 
this graph shows that the average initial grasp error for many 
of the 19 test objects was lower when SCHEMA structured 
LEARNING used previous training experience than when it did 
not. For 14 out of the 19 test objects, the performance without 
training was worse than one standard deviation away from the 
mean performance with training. 

Figure 8(b) suggests a similar conclusion. This graph 
analyzes the value of training experience in terms of the 
probability of successfully holding and lifting the test object. 
A hold is considered successful only if all of the grasping 
contacts continue to apply the reference hold force and the 
contacts do not apply a large moment on the object. This is 
only true when a good grasp has been established close to the 
object center of mass. Figure 8(b) shows that, averaged over all 
19 objects, the probability of successfully grasping and lifting 
the object without using training experience is around 50%. 
However, this probability rises significantly when the training 
experience is used. In fact, the figure shows that the probability 
of successfully grasping and lifting the object almost always 
rises (except in one case) when the algorithm is allowed to 
use training experience. 





Fig. 6. The five training objects used in the grocery bagging experiment. 



(15) (16) (17) (18) (19) 


Fig. 7. The 19 test objects used in the grocery bagging experiment. 



Fig. 8. Performance of grasping the 19 test objects with (the solid line) and without (the dashed line) previous experience grasping the five training objects. 
In both plots, the horizontal axis represents the object number from Figure 7. (a) plots the mean grasp error after executing the REACH and before executing 
the GRASP for the 19 test objects. The error bars plot one standard deviation above and below the mean, (b) plots the mean probability of successfully lifting 
the object. 

This experiment demonstrates that it is possible to learn VI. CONCLUSION 

general reach-grasp skills based on experience with a limited 

set of objects and apply these skills to new objects. Although This paper takes a control-based approach to grasp syn- 
the experimenter selected the 19 test objects, they are a thesis, whereby the problem is recast as that of correctly 
representative sampling of a large class of objects that can sequencing and combining reach and grasp controllers. Instead 

be found in most grocery stores. of considering every possible sequence of controllers, consid- 

eration is limited to those sequences that match a generalized 
grasp strategy, encoded as an action schema. Using schema 
structured learning, the robot autonomously learns how to in- 


















stantiate the generalized policy as a function of grasp context. 
This approach is experimentally demonstrated to be capable 
of learning suitable grasp strategies as a function of object 
eccentricity and orientation. In addition, it is shown that once 
grasp strategies have been learned for a representative set of 
objects, these strategies can improve grasp performance other 
objects, even when those objects have not been experienced 
before. 
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